AITopics | block coordinate descent

Collaborating Authors

block coordinate descent

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Fast Sparse Group Lasso

Yasutoshi Ida, Yasuhiro Fujiwara, Hisashi Kashima

Neural Information Processing SystemsFeb-14-2026, 07:54:53 GMT

However,asan update ofonlyoneparameter group depends onalltheparameter groups ordata points, the computation cost is high when the number of the parameters or data points islarge. This paper proposes afast Block Coordinate Descent for Sparse GroupLasso.

artificial intelligence, equation, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

Fast Sparse Group Lasso

Neural Information Processing SystemsDec-26-2025, 00:18:10 GMT

Sparse Group Lasso is a method of linear regression analysis that finds sparse parameters in terms of both feature groups and individual features. Block Coordinate Descent is a standard approach to obtain the parameters of Sparse Group Lasso, and iteratively updates the parameters for each parameter group. However, as an update of only one parameter group depends on all the parameter groups or data points, the computation cost is high when the number of the parameters or data points is large. This paper proposes a fast Block Coordinate Descent for Sparse Group Lasso. It efficiently skips the updates of the groups whose parameters must be zeros by using the parameters in one group. In addition, it preferentially updates parameters in a candidate group set, which contains groups whose parameters must not be zeros. Theoretically, our approach guarantees the same results as the original Block Coordinate Descent. Experiments show that our algorithm enhances the efficiency of the original algorithm without any loss of accuracy.

block coordinate descent, name change, sparse group lasso, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.61)

Add feedback

Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks

Ziming Zhang, Matthew Brand

Neural Information Processing SystemsNov-21-2025, 10:26:23 GMT

Therefore, it does not suffer from vanishing gradient at all.

algorithm, convergence, optimization, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Instructional Material (0.46)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Block Coordinate Descent for Neural Networks Provably Finds Global Minima

Akiyama, Shunta

arXiv.org Machine LearningOct-28-2025

In this paper, we consider a block coordinate descent (BCD) algorithm for training deep neural networks and provide a new global convergence guarantee under strictly monotonically increasing activation functions. While existing works demonstrate convergence to stationary points for BCD in neural networks, our contribution is the first to prove convergence to global minima, ensuring arbitrarily small loss. We show that the loss with respect to the output layer decreases exponentially while the loss with respect to the hidden layers remains well-controlled. Additionally, we derive generalization bounds using the Rademacher complexity framework, demonstrating that BCD not only achieves strong optimization guarantees but also provides favorable generalization performance. Moreover, we propose a modified BCD algorithm with skip connections and non-negative projection, extending our convergence guarantees to ReLU activation, which are not strictly monotonic. Empirical experiments confirm our theoretical findings, showing that the BCD algorithm achieves a small loss for strictly monotonic and ReLU activations.

artificial intelligence, machine learning, neural network, (17 more...)

arXiv.org Machine Learning

2510.22667

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsOct-2-2025, 19:57:23 GMT

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. Summary: The authors propose a new Newton-like method to optimize the sum of a smooth (convex) cost function and multiple decomposable norms. Their contributions are (1) an active subspace selection procedure that allows to speed up the solution of the quadratic approximation problem (2) a proof that solving the quadratic approximation problem over the (changing) active subspace still leads to convergence. The authors also provide numerical results showing that, for two important problems, their methods gives 10x speed up over state-of-the-art methods and, in the appendix, give numerical results that illustrate which fraction of the speed up is due to the quadratic approximation technique and which fraction of the speed up is due to the active subspace selection method. Quality: The amount of critical information in the appendix makes this paper more suited for a journal than a conference.

algorithm, convergence, optimization problem, (11 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Fast Sparse Group Lasso

Yasutoshi Ida, Yasuhiro Fujiwara, Hisashi Kashima

Neural Information Processing SystemsAug-20-2025, 03:41:44 GMT

Sparse Group Lasso is a method of linear regression analysis that finds sparse parameters in terms of both feature groups and individual features.

candidate group, equation, vector, (13 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
North America > Canada (0.04)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)

Add feedback

Accelerated Mini-batch Randomized Block Coordinate Descent Method

Tuo Zhao, Mo Yu, Yiming Wang, Raman Arora, Han Liu

Neural Information Processing SystemsFeb-8-2025, 21:52:37 GMT

We consider regularized empirical risk minimization problems. In particular, we minimize the sum of a smooth empirical risk function and a nonsmooth regularization function. When the regularization function is block separable, we can solve the minimization problems in a randomized block coordinate descent (RBCD) manner. Existing RBCD methods usually decrease the objective value by exploiting the partial gradient of a randomly selected block of coordinates in each iteration. Thus they need all data to be accessible so that the partial gradient of the block gradient can be exactly obtained.

artificial intelligence, gradient, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Fast Sparse Group Lasso

Neural Information Processing SystemsOct-11-2024, 00:07:31 GMT

block coordinate descent, fast sparse group lasso, sparse group lasso, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.65)

Add feedback

Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks

Ziming Zhang, Matthew Brand

Neural Information Processing SystemsOct-3-2024, 13:59:10 GMT

By lifting the ReLU function into a higher dimensional space, we develop a smooth multi-convex formulation for training feed-forward deep neural networks (DNNs). This allows us to develop a block coordinate descent (BCD) training algorithm consisting of a sequence of numerically well-behaved convex optimizations. Using ideas from proximal point methods in convex analysis, we prove that this BCD algorithm will converge globally to a stationary point with R-linear convergence rate of order one. In experiments with the MNIST database, DNNs trained with this BCD algorithm consistently yielded better test-set error rates than identical DNN architectures trained via all the stochastic gradient descent (SGD) variants in the Caffe toolbox.

algorithm, convergence, optimization, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Instructional Material (0.47)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

Differentially Private Neural Network Training under Hidden State Assumption

Chen, Ding, Liu, Chen

arXiv.org Artificial IntelligenceJul-11-2024

We present a novel approach called differentially private stochastic block coordinate descent (DP-SBCD) for training neural networks with provable guarantees of differential privacy under the hidden state assumption. Our methodology incorporates Lipschitz neural networks and decomposes the training process of the neural network into sub-problems, each corresponding to the training of a specific layer. By doing so, we extend the analysis of differential privacy under the hidden state assumption to encompass non-convex problems and algorithms employing proximal gradient descent. Furthermore, in contrast to existing methods, we adopt a novel approach by utilizing calibrated noise sampled from adaptive distributions, yielding improved empirical trade-offs between utility and privacy.

neural network, noise, privacy loss, (12 more...)

arXiv.org Artificial Intelligence

2407.08233

Country:

Asia > China > Hong Kong (0.04)
North America > United States > Washington > King County > Seattle (0.04)
Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)

Add feedback